A snapshot of the Physcomitrella N-terminome reveals N-terminal methylation of organellar proteins

Key message Analysis of the N-terminome of Physcomitrella reveals N-terminal monomethylation of nuclear-encoded, mitochondria-localized proteins. Abstract Post- or co-translational N-terminal modifications of proteins influence their half-life as well as mediating protein sorting to organelles via cleavable N-terminal sequences that are recognized by the respective translocation machinery. Here, we provide an overview on the current modification state of the N-termini of over 4500 proteins from the model moss Physcomitrella (Physcomitrium patens) using a compilation of 24 N-terminomics datasets. Our data reveal distinct proteoforms and modification states and confirm predicted targeting peptide cleavage sites of 1,144 proteins localized to plastids and the thylakoid lumen, to mitochondria, and to the secretory pathway. In addition, we uncover extended N-terminal methylation of mitochondrial proteins. Moreover, we identified PpNTM1 (P. patens alpha N-terminal protein methyltransferase 1) as a candidate for protein methylation in plastids, mitochondria, and the cytosol. These data can now be used to optimize computational targeting predictors, for customized protein fusions and their targeted localization in biotechnology, and offer novel insights into potential dual targeting of proteins. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00299-024-03329-1.


Introduction
Following translation at the ribosome, the N-terminus of a protein is subjected to a plethora of modifications among which are proteolytic processing and the addition of moieties such as acetyl, methyl or other functional groups (Fortelny et al. 2015;Meinnel and Giglione 2008).In turn, the N-terminus enables subcellular targeting and is a determinant of protein half-life (Varshavsky 1996;Kunze and Berger 2015;Armenteros et al. 2019;Varshavsky 2019).
Modifications of the N-terminus are introduced in a co-or a posttranslational manner with cotranslational acetylation and methionine-excision being among the most abundant modifications in eukaryotes (Ree et al. 2018;Giglione and Meinnel 2021).In plants, however, N-terminal acetylation also occurs in a posttranslational manner on plastid stromal proteins after import and cleavage of their targeting peptide (Giglione and Meinnel 2021).In contrast to proteolytic trimming, amino acids can also be added to the apparent N-terminus of a protein in a ribosome-independent manner (Varshavsky 1996;Tasaki et al. 2012) as part of the N-degron pathway for targeted proteolysis.Various methods such as COFRADIC (Staes et al. 2011), TAILS (Kleifeld et al. 2010) or HUNTER (Demir et al. 2022) have been established and permit the characterization of proteases, high-throughput degradomics and profiling of Nterminal acetylation.In turn, N-terminomics data are available from public databases such as TopFIND (https://topfind.clip.msl.ubc.ca/) for various organisms including human, mouse, yeast, and Arabidopsis.
In contrast, almost no N-terminomics data were available for the model plant Physcomitrella (Physcomitrium patens; Lueth and Reski, 2023).This moss is a versatile model system for evodevo studies (Horst et al. 2016), plant physiology (Decker et al. 2017;Wiedemann et al. 2018) and evolution of metabolic pathways (Renault et al. 2017) due to its interesting evolutionary position at the early divergence of land plants (Rensing et al. 2008).It has further proven to be a valuable system for proteomic and proteogenomic research due to its easy and axenic culture conditions enabling highly reproducible and even GMP-compliant culture conditions (Sarnighausen et al. 2004;Heintz et al. 2006;Mueller et al. 2014;Hoernstein et al. 2016Hoernstein et al. , 2018;;Fesenko et al. 2019Fesenko et al. , 2021)).Besides broad application in basic research, Physcomitrella is employed as a production platform for recombinant biopharmaceuticals in GMP-compliant bioreactors (Decker and Reski 2020;Ruiz-Molina et al. 2022;Tschongov et al. 2024).
Here, we provide a snapshot of the N-terminome of the moss Physcomitrella with a focus on the cleavage of N-terminal targeting sequences, N-terminal acetylation and N-terminal monomethylation.The data was compiled using 24 datasets from various experimental setups and subsequent N-terminal peptide enrichment using a modified TAILS approach.We reveal apparent N-terminal methylation not only of plastid and cytosolic proteins but also of mitochondrial proteins.Furthermore, we provide a list of confirmed targeting peptide cleavage sites along with a candidate list of proteins which are dually targeted to plastids and mitochondria as well as to mitochondria and the cytosol.With this, we provide a resource for basic research as it contains information about translation of splice variants as well as posttranslational and posttranscriptional processing of proteins.Moreover, targeting of recombinant proteins to plastids of Nicotiana benthamiana (Maclean et al. 2007) or the extracellular space in Physcomitrella (Schaaf et al. 2005) enabled high yields of the desired recombinant product.Consequently, our present data also provide a comprehensive resource for further tailored recombinant protein production and targeting in Physcomitrella.

Overview of identified N-termini
The present data provide a qualitative overview of the N-terminome of the moss Physcomitrella using a compilation of 24 datasets from N-terminal peptide enrichments from different tissues, treatments and different sample processing protocols.The datasets were obtained during method establishment for various purposes not related to the analysis performed in this study and hence the data was only assessed qualitatively and will not allow any cross-sample comparison.A table providing details about the sample type, tissue employed and other experimental parameters is available from Supplemental Table S1.
Enrichment of N-terminal peptides was performed as described in Hoernstein et al. (2018) with modifications.Free amino groups in the protein sample were blocked by reductive dimethylation according to Kleifeld et al. (2010) and depletion of internal peptides after proteolysis was performed according to McDonald and Beynon (2006).MS measurements were performed on an LTQ-Orbitrap Velos Pro (ThermoScientific) and raw data were processed and searched with Mascot (Matrix Science).All database search results were loaded in Scaffold5 TM (V5.0.1, https://www.proteomesoftware.com/)software and proteins were accepted with a ProteinProphet TM (Nesvizhskii et al. 2003) probability of at least 99% and a minimum of 1 identified peptide.Peptides were accepted at a PeptideProphet TM (Keller et al. 2002) probability of at least 95% and a Mascot ion score of a least 40 (Supplemental Table S2).
Using these settings, from a total of 24 datasets we identified 11,533 protein N-termini using 32,213 spectra corresponding to 4517 proteins (3920 protein groups) with a decoy FDR (false discovery rate) of 0.4% at the protein level and 0.08% at the peptide level (Supplemental Tables S2, S3).Approximately 20% of the identified N-termini represented either the initiator methionine (start index 1, Figure 1A) or the subsequent amino acid after cleavage of the initiator methionine (start index 2).For approximately 40% a start index between 2 and 100 was identified, indicating proteolytic processing and cleavage of subcellular targeting sequences.A single experimentally determined N-terminus was observed in approximately 70% of all cases whereas for approximately 30% of the identified proteins two or more Ntermini were observed (Figure 1B).This is in strong contrast to previous findings from proteins in Physcomitrella bioreactor supernatants (Hoernstein et al. 2018) where at least 2 distinct Ntermini were observed for approximately 80% of all identified proteins.Further, we analyzed the presence of N-terminal modifications with a focus on N-terminal acetylation, monomethylation and presence of pyro-glutamate (pyroGlu) at the N-terminus.The latter modification can occur spontaneously or via enzymatic catalysis on N-terminal glutamine residues (Schilling et al. 2008).Since pyro-glutamate formation can also occur following proteolysis during sample processing (Purwaha et al. 2014), only peptides where the preceding P1 amino acid did not match the specificity of the employed protease (e.g., no peptides with K or R as preceding amino acid in the case of trypsin digests) were considered here.
Approximately 25% of all identified N-termini were acetylated, 71% were identified without any modification and 4% were identified being either methylated or having N-terminal pyroGlu (Figure 1C).The actual level of N-terminal pyroGlu occurrence is likely higher, but due to specificity ambiguity with the experimentally employed proteases this cannot be further analyzed.At the protein level, we found approximately 76% of the nuclear-encoded proteins having either the retained or the cleaved initiator methionine (1436 protein groups, Supplemental Table S3) to be N-terminally acetylated (1097 protein groups, Supplemental Table S3).This degree is slightly below the estimated degree of around 90% of N-terminal acetylated proteins in plants (Bienvenut et al. 2012;Linster and Wirtz 2018).preceding amino acid did not match the specificity of the applied protease (e.g., peptides identified with K|R in the P1 position were rejected in the case of trypsin digests).

Post-import trimming of plastid proteins
Cleavable N-terminal sequences are required for sub-and extracellular targeting of nuclearencoded proteins.Their cleavage via specific proteases after translocation across a respective organellar membrane generates a new N-terminus that represents either the final N-terminus of the translocated protein or a new site for further proteolytic processing by organellar proteases.For Physcomitrella, a total of 8681 cleavable N-terminal targeting sequences are predicted (Supplemental Figure S1) and here we compared our experimentally observed Ntermini to these predictions allowing a tolerance window of ±5 amino acids around a predicted targeting peptide cleavage site (Figure 2A).In the following, a difference of 0 indicates agreement of an observed N-terminal amino acid with a predicted cleavage site (predicted P1 amino acid, Figure S2A).Within this range we confirm the predicted cleavage sites of 748 plastid targeting signals (cTP), of 57 thylakoid luminal targeting signals (luTP), of 154 mitochondrial presequences (mTP), and of 185 secretory signal peptides (SP) using our present N-terminomics data (Figure S2B).This data is compiled in Supplemental Table S4.
Among the confirmed plastid proteins, we find approximately 42% to be N-terminally acetylated (317 protein isoforms, Supplemental Table S4).Apparently, N-termini identified around a plastid transit peptide cleavage site only matched the predicted cleavage site exactly for approximately 47% (Figure 2A).This percentage is strikingly higher in all other cases with almost 70% for thylakoid luminal transit peptides (Figure 2B) and around 65% for mitochondrial pre-sequences and secretory signal peptides (Figures 2C, D).Further, approximately 43% of the N-termini of plastid proteins within the chosen difference window deviate by 1-5 amino acids upstream of the predicted cleavage site with decreasing frequency.This effect is less apparent for luminal targeting sequences (approximately 33%), mitochondrial pre-sequences (approximately 22%) and secretory signal peptides (approximately 32%).Consequently, the distribution of differences between predicted and observed plastid transit cleavage site (Figure 2A) indicates a successive proteolytic postprocessing pattern of plastid proteins after cleavage of their transit peptide.A similar scenario with multiple cleavage sites around predicted plastid transit peptide cleavage sites has also been observed in Arabidopsis (Bienvenut et al. 2012;Rowland et al. 2015).S4. (E) Bar chart depicting the distribution of plastid protein isoforms with confirmed cleavage of a plastid targeting peptide and their identified N-terminal modifications.Percentages are related to the total number of identified proteins with a cleaved N-terminal plastid targeting peptide (748) within a window of ±5 amino acids around a predicted cleavage site.All data are available from Supplemental Table S4.Frequency of identified N-termini around a predicted plastid transit peptide cleavage site being either acetylated (F), unmodified (free, (G)) or monomethylated (H).Cum.[%]: cumulative percentage (red points).

N-terminal modifications of plastid and mitochondrial proteins
Among the proteins with an identified plastid transit peptide cleavage site, we found 42% protein isoforms with an acetylated N-terminus and almost 10% with a monomethylated Nterminus (Figure 2E).Strikingly, plastid N-termini being acetylated and non-modified (free) both show this successive cleavage pattern whereas monomethylated N-termini do not show this pattern (Figures 2F-H).
This raises the question whether this processing occurs on both, acetylated and free N-termini, or whether only free N-termini are processed and subsequently acetylated.One explanation would be that N α -acetylation of plastid proteins is incomplete and affects only a fraction of each protein isoform.Effectively, many N-termini of plastid proteins were identified in this study in a dual state, being acetylated and free (e.g., Pp3c18_19140V3.1,Pp3c15_7750V3.4;Supplemental Table S4).In this case, N α -acetylation would prevent N-terminal trimming, whereas the fraction with an unmodified N-terminus would be proteolytically processed to different levels and subsequently acetylated.This scenario may be supported by the fact that both, N α -acetylated and free N-termini share a similar preference of N-terminal amino acids (Figure 3) with alanine and serine being the most prominent ones.Apparently, the relative amino acid frequency of the N-terminally acetylated plastid proteins is strikingly similar to the relative frequency in Arabidopsis (Huesgen et al. 2013).Although the plastid protease inventory is under active investigation (Meinnel and Giglione 2022;van Wijk 2024), a specific protease for such N-terminal trimming has not yet been identified.Aminopeptidases identified in Arabidopsis were recently proposed to also confer trimming functions (Rowland et al. 2015;Meinnel and Giglione 2022), but their activity was investigated on released plastid transit peptides in conjunction with pre-sequence proteases (Teixeira et al. 2017), but not on the protein N-terminus of the corresponding protein.S4. "n" represents the total number of non-redundant sequences.Sequences were aligned at the identified N-terminal amino acid.
On the other hand, a cleavage of an N α -acetylated amino acid may also be possible, although until now no protease has clearly proven activity on acetylated N-termini of intact proteins.
However, acylamino acid-releasing enzyme (AARE), a bifunctional serine protease (Tsunasawa et al. 1975;Fujino et al. 2000;Shimizu et al. 2003;Nakai et al. 2012;Hoernstein et al. 2023), has a proven activity on N α -acetylated oligopeptides and activity on intact proteins is repeatedly considered (Tsunasawa et al. 1975;Arfin and Bradshaw 1988;Adibekian et al. 2011).Moreover, one moss (Physcomitrella) AARE isoform and the Arabidopsis AARE are localized not only to the cytoplasm but also to plastids and mitochondria (Hoernstein et al. 2023).This renders AARE an interesting novel candidate for plastid protein processing, especially since there is a strong substrate preference of this protease towards Ac-Ala (Hoernstein et al. 2023;Yamauchi et al. 2003), which is the most frequent N α -acetylated amino acid of plastid proteins observed here (Figure 3).
We also identified several mitochondrial proteins being N-terminally acetylated around a predicted transit peptide cleavage site (Supplemental Tables S4, S5).However, despite wellknown N α -acetylation of plastid proteins, this modification has not been identified yet to a similar extent on mitochondrial proteins and its apparent presence is not clear (Giglione and Meinnel 2021).The N-terminally acetylated amino acids were A, T, S and in all cases, they were preceded by a methionine which may also indicate alternative translation initiation.
Consequently, we considered this not to be an as yet undiscovered modification of mitochondrial proteins, but rather as a co-translational modification of a shorter, e.g.cytoplasmic, isoform derived from alternative translation initiation or from alternative splicing.In Physcomitrella, both mechanisms are known to target protein isoforms to distinct subcellular localizations (Kiessling et al. 2004;Hoernstein et al. 2023).Apart from these two scenarios, dual targeting of proteins to plastids and mitochondria via ambiguous targeting signals is also considerable.In this case, the N-terminally acetylated protein would represent the plastid-localized variant.Hence, we investigated those N-terminally acetylated and potentially mitochondria-localized proteins with a focus on alternative translation initiation sites or splice variants that would give rise to shorter, possibly cytoplasmic, protein isoforms (Supplemental Table S5).Subcellular targeting predictions were performed with Localizer (Sperschneider et al. 2017) and the presence of ambiguous targeting signals was predicted with ATP2 (Fuss et al. 2013).In two cases (Pp3c13_17110V3.1,Pp3c21_2600V3.1) potential dual targeting to plastids and mitochondria was predicted by Localizer and ATP2 but alternative translation initiation from the downstream methionine was also likely (Supplemental Table S5).In most other cases, we found either a potential alternative translation initiation site (Pp3c15_21480V3.1,Pp3c4_3210V3.1)or alternative splice variants (Pp3c22_8300V3.1,Pp3c7_24050V3.2) that facilitate translation of a shorter open reading frame.For one protein (Pp3c9_14150V3.1,Pp3c9_14150V3.2) the situation remains unclear.
Nevertheless, the present data indicate that the observed N-terminally acetylated proteins are localized in plastids or the cytoplasm rather than in mitochondria.
Besides acetylation, we also observed N-terminal methylation on plastid proteins and on mitochondrial proteins.Strikingly, monomethylation on plastid proteins was also identified predominantly not only on N-terminal alanine and serine but also on N-terminal methionine (Figure 3).The apparent absence of a successive cleavage pattern similar to that observed for free or acetylated N-termini may indicate a stabilizing effect of monomethylation on the modified protein.This modification is found in eukaryotes and prokaryotes (Stock et al. 1987) but is poorly investigated in plants.It has been proven for the small subunit of Rubisco (RbcS) in pea, spinach, barley and corn (Grimm et al. 1997).Accordingly, we found the N-terminal methionine of RbcS to be monomethylated at its N-terminal methionine (after transit peptide removal, Pp3c12_19890V3.4,Supplemental Figure S3) in Physcomitrella suggesting evolutionary conservation of this modification on RbcS.
Apart from those plastid proteins, we also identify 49 mitochondrial proteins being monomethylated at their N-terminus, matching the predicted presequence cleavage site (Supplemental Table S4), including cytochrome C subunit 5B (COX5B, Pp3c19_11870V3.1, Figure S4).A strong overrepresentation of serine as N-terminally monomethylated amino acid was observed (Figure 3), whereas alanine and serine were equally frequent on non-modified N-termini of mitochondrial proteins.This specificity of methylated proteins in plastids and mitochondria resembles only partially the specificity of human NTM1A (NAC with transmembrane motif1; UNIPROT: Q9BV86) (Schaner Tooley et al. 2010;Wu et al. 2015) which methylates N-terminal alanine and serine when followed by a proline and a lysine.In our data, proline and lysine were not frequently observed as subsequent amino acids (Figure 3).Intriguingly, the observed amino acid frequency of methylated plastid and mitochondrial proteins rather resembles the situation in yeast (Chen et al. 2021).Despite RbcS, N-terminal methylation of plant proteins was reported only on cytosolic and plastid ribosomal subunits and histones (Carroll et al. 2008;Webb et al. 2010).Also, here the N-terminal amino acid sequences (Webb et al. 2010) share almost no homology with the methylated N-termini observed in our data.Nevertheless, we identified the cytosolic ribosomal subunit RPL19 to be methylated at its mature N-terminus (Pp3c18_14440V3.1, Figure S5), but also other likely cytosolic proteins (Supplemental Table S3).
Lastly, we also identified several proteins to be methylated at their N-terminus after cleavage of a predicted secretory signal peptide (Supplemental Table S4).Here, most of them seemed to be false positive identifications due to isotope peak errors.Hence, they were not further considered.
The knowledge of protein N-terminal methylation especially in plants is scarce and until now the methylation of RbcS was regarded as an exception (Grimm et al. 1997;Petkowski et al. 2013).In contrast, our data reveal that this modification affects several plastid and mitochondrial proteins with similar specificity of the methylating enzyme.In humans, two Nterminal methyltransferases are known so far, NTM1 and NTM2 (Schaner Tooley et al. 2010;Petkowski et al. 2013).To investigate whether homologues in Physcomitrella do exist, we used the sequence of human NTM1 (UNIPROT: Q9BV86) as a query for a BlastP search (Altschul et al. 1997) against all Physcomitrella V3.3 protein models (Lang et al. 2018) using the Phytozome database (https://phytozome-next.jgi.doe.gov/)(Goodstein et al. 2012) and identified a single protein (Pp3c22_8670V3.1;identity 36%, alignment length 74 amino acids) sharing the same protein family annotations as human NTM1A (InterPro: Alpha-N-methyltransferase NTM1 (IPR008576); S-adenosyl-L-methionine-dependent methyltransferase (IPR029063).In a reciprocal BlastP search using Pp3c22_8670V3.1 (hereafter referred to as PpNTM1) as a query, human NTM1 appeared as best Blast hit confirming the orthology.A full InterPro search against all Physcomitrella V3.3 protein sequences (Lang et al. 2018) did not reveal any further hits with this protein family annotation.In agreement, PpNTM1 was also the only Blast hit when using the sequence of NTM2 (UNIPROT: Q5VVY1), a human homologue of NTM1, as Blast query.In Physcomitrella, PpNTM1 is expressed in all major tissues at moderate levels (Supplemental Figure S6).We also found only a single Arabidopsis homologue (AT5G44450.1)which is predicted to localize to plastids by both predictors.Interestingly, PpNTM1 is predicted via TargetP2.0(Armenteros et al. 2019) to localize to mitochondria, whereas plastid localization is predicted by Localizer (Sperschneider et al. 2017) (Supplemental Table S6).
Moreover, a potential alternative translation initiation site might be at M 56 (Kozak Similarity Score ≥ 0.7 and < 0.8; Gleason et al. 2022a, b) which would not interfere with the predicted domain structure and enable cytosolic localization.We further investigated the conservation of residues with known catalytic function in the human isoform (Dong et al. 2015;Wu et al. 2015) in plant homologues.Surprisingly, only one out of seven known catalytic sites is conserved in Physcomitrella (Supplemental Figure S7A) whereas all except one are conserved in Arabidopsis.Notably, Arabidopsis NMT1 has a three amino acid long motif "EPV" motif where the human isoform has the motif "DIT" (Supplemental Figure S7A).In turn, the EPV motif seems to be conserved in all other plant species analyzed here, except in Physcomitrella (Supplemental Figure S7A).Finally, we checked the structure predictions from AlphaFold (Jumper et al. 2021;Varadi et al. 2024).Whereas human and Arabidopsis NTM1 share obvious structural similarities (Supplemental Figure S7B, C), the predicted structure of Physcomitrella NMT1 is different, but of poor prediction quality (Supplemental Figure S7D).Nevertheless, searching for similar structures of PpNTM1 using Foldseek (van Kempen et al. 2023) in turn yielded sequences of NTM1 isoforms from other species such as rice (UNIPROT: Q10CT5).Consequently, it is not yet fully clear whether PpNTM1 is a methyltransferase responsible for the monomethylation observed here and whether it might be dually targeted to plastids and mitochondria, at least in Physcomitrella.Our present data do not provide any support for predicted transit peptide cleavages or alternative translation initiation for this protein.Hence, further research is required to investigate the molecular function and localization of PpNTM1.Currently, we envision two possible scenarios: i) The deviant Physcomitrella NMT1 is responsible for the monomethylation of N-termini observed here.A targeted gene ablation based on highly efficient homologous recombination (Hohe et al. 2004) would result in knockout mutants with no or drastically reduced monomethylated N-termini.ii) There is at least one other enzyme besides NMT1 responsible for the observed monomethylation of Ntermini in Physcomitrella, and possibly also in other plants.

Conclusion
In the present study we used a compilation of 24 proteomic datasets obtained from various experiments to provide first insights into the N-terminome of the model plant Physcomitrella.
We found that the percentage of N-terminal acetylation of cytosolic proteins appears slightly lower than the estimated percentage in Arabidopsis.Our data allow the confirmation of hundreds of predicted targeting peptide cleavage sites localizing proteins to plastids.These data can now be used to optimize computational targeting predictors, for customized protein fusions and their targeted localization in biotechnology, and it provides novel insights into potential dual targeting of proteins.Moreover, we reveal N-terminal monomethylation as a yet unknown modification of mitochondrial proteins.The function and impact of this modification remains to be further analyzed, but we propose PpNTM1 as a candidate for protein methylation in plastids, mitochondria and the cytosol.

Treatments
Treatment with the proteasome inhibitor epoxomicin was done using gametophores cultivated on agar plates.Gametophores were harvested and incubated in 10 mL Knop medium containing 20 µM epoxomicin for 24 h (enrichment II, Supplemental Table S1).Redlight treatment (enrichment III, Supplemental Table S1) was done using hydroponic gametophore cultures.Cultures were incubated for three days in a red-light chamber at 650 nm for three days.Additionally, 50 µM of the proteasome inhibitor MG132 were applied in the culture medium at the beginning of the treatment.Dark treatments (enrichment V and VI, Supplemental Table S1) were done by wrapping the entire boxes of hydroponic gametophore cultures in aluminum foil for the indicated time and wrapped boxes were cultivated further at the same conditions as before.Proteasome inhibition of gametophores during dark treatment (enrichment VI, Supplemental Table S1) was done by submerging a hydroponic ring culture entirely in Knop medium containing 100 µM MG132.The box was wrapped in aluminum foil and incubated for 24 h.

Enrichment of nuclei from gametophores
Eighteen g fresh weight (FW) gametophores were harvested from hydroponic culture and chopped in buffer I containing 1 M 2-Methyl-2,4-pentandiol, 10 mM HEPES pH 7.5, 10 mM KCl 10 mM DTT, 0.1% PVP40, 0.1% PPI (P9599, Sigma-Aldrich) according to Nelson et al. (1994) using a custom 4 razorblade chopping device.The homogenate was successively filtered through a 40 µm and a 20 µm sieve and the flow-through was centrifuged for 30 min at 300 x g at 2°C.The supernatant was discarded, and the pellets were carefully dissolved in buffer II containing 110 mM KCl, 15 mM HEPES, pH 7.5, 5mM DTT and 0.1% PPI.The enriched nuclei were further purified using three-step Percoll-gradients (100%/60%/30%, 17-0891-01, GE Healthcare, Solingen, Germany) modified after Marienfeld et al. (1989).The Percollgradients were centrifuged at 200 x g at 2°C for 30 min.The interface between 100% and 60% was recovered as well as the pellet at the top of the gradient attached to the tube wall.Both fractions were strongly enriched in nuclei and thus pooled for further experiments.The samples were combined and washed with buffer II and centrifuged again for 10 min at 300 x g at 2°C.The pellet containing enriched nuclei was stored at -20°C until further use.
The remaining pellet was dissolved in 50 mM Tris-HCl, pH 7.6, 4% SDS, 1% PPI, 50 mM DTT and incubated at 95°C for 10 min.The sample was centrifuged, and the supernatant was acetoneprecipitated overnight.All acetone precipitations were centrifuged at 20,000 x g at 0°C for 15 min.The supernatant was discarded, and the remaining protein pellet was washed for 1 h with 1 volume ice-cold acetone without DTT.The centrifugation step was repeated, and the supernatant was discarded afterwards.The remaining protein pellets were air dried and stored at -20°C for further experiments.

Sequential protein extraction from gametophores
One to two g FW of gametophores were ground in liquid nitrogen for 10-15 min.The fine powder was dissolved in Tris buffer containing 40 mM Tris-HCl, pH 7.6, 0.5% PVPP and 1% PPI.
The homogenate was sonicated for 15 min and afterwards centrifuged at 20,000 x g at 4°C for 30 min.The supernatant (Tris-extract) was recovered.The remaining pellet containing cell debris was dissolved in 40 mM Tris-HCl pH 7.6, 2% Triton X-100, 1% PPI and again sonicated for 15 min.Again, centrifugation was performed at 20,000 x g at 4°C for 30 min and the supernatant (Triton-extract) was recovered.Protein concentrations of the extracts were directly determined via the Bradford assay (Bradford 1976) and aliquots corresponding to 100 µg protein were precipitated with acetone containing DTT as described before.

Enrichment of N-terminal peptides
The dimethylation reaction was carried out according to Kleifeld et al. (2010) with some modifications.Protein pellets were dissolved in 100 mM HEPES-NaOH pH pH 7.5, 0.2% SDS.
Reduction of cysteine residues was carried out using Reducing Agent (NP0009, Life Technologies™, Carlsbad, USA) 1:10 at 95°C for 10 min or Bond-Breaker® (77720, Thermo Scientific) 1:100 at 28°C for 30 min.Alkylation was performed at a final concentration of 100 mM iodoacetamide for 20 min at RT.The dimethylation reaction was carried out by adding 2 µL of a 4% formaldehyde solution (Formaldehyde 13 C, d2 solution, 596388, Sigma-Aldrich or Formaldehyde-D2, DLM-805-PK, Cambridge Isotope Laboratories Inc.) and 2 µL of a 500 mM NaCNBH3 solution per 100 µL sample at 37°C for 4 h.The same volumes of formaldehyde and NaCNBH3 were added again to the sample and the reaction was carried out overnight at 37°C.The dimethylation reaction was stopped by adding 2 µL of a 4% NH4OH solution per 100 µL sample for 1 h at 37°C.Afterwards the samples were precipitated as described before using acetone without DTT for at least 3 h at -20°C.The final enrichment was modified after McDonald and Beynon (2006).

SDS-based enrichment
The dried protein pellets were dissolved in binding buffer according to McDonald and Beynon (2006) containing 20 mM NaH2PO4, 150 mM NaCl pH 7.5 with 0.2% SDS and in solution digest using either Trypsin (V5280, Promega, Madison, USA), GluC (90054, Thermo Scientific) or Chymotrypsin (V1062, Promega) was performed at an enzyme-to-substrate ratio of 1:25 for 4 h at 37°C (Trypsin, GluC) or 25°C (Chymotrypsin).Then the ratio was increased to 1:20 and the reaction was carried out overnight.Enrichment of N-terminal labeled peptides was carried out using 200 µL NHS-Sepharose slurry (17-0906-01, GE Healthcare, Solingen, Germany) per 100 µg protein.The slurry was centrifuged for 30 sec at 200 x g.The supernatant was discarded and 400 µL ice-cold 1 mM HCl was added.The slurry was centrifuged again, and the supernatant was discarded.Afterwards the Sepharose was washed with 1 mL binding buffer without SDS.The samples were applied to the prepared Sepharose and incubated for 4 h at RT.The Sepharose was again centrifuged, and the supernatant was transferred to a new tube containing freshly prepared Sepharose.The used Sepharose was washed with 20 µL binding buffer and the supernatant was also added to the freshly prepared Sepharose.The enrichment reaction was carried out overnight at 4-8°C.The enriched peptides were desalted using 200 µl C18 StageTips (SP301, Thermo Scientific) that were supplemented with an additional layer of Empore TM SPE Disk C18 material (66883-U Sigma-Aldrich).The tips were washed prior to use with 100 µl 0.1% TFA and subsequently with 100 µl 80% ACN, 0.1% TFA.The tips were again equilibrated with 100 µl 0.1% TFA and the samples were loaded afterwards.The remaining Sepharose was washed with 50 µl binding buffer and the supernatant was also transferred to the tip.The tips were washed with 100 µl binding buffer and the retained peptides were eluted with 300 µl 80% ACN, 0.1% TFA.The eluate was vacuum dried and the samples were stored at -20°C until further analysis.

RapiGest-based enrichment
The dried protein pellets were dissolved in 50 mM HEPES-KOH, 0.1% RapiGest surfactant (RPG, 18600186, Waters, Milford Massachusetts, USA).Proteolytic digest was performed as described before.After digestion, the RPG surfactant was cleaved by acidifying the sample to pH 2 using TFA as recommended by the manufacturer.The cleavage was performed at 37°C for 45 min.Insoluble RPG remnants were removed by centrifugation at 13000 rpm for 10 min at RT.The peptide-containing supernatant was subjected to solid phase extraction using SampliQ C18 cartridges (1 ml, 100 mg, 5982-1111, Agilent, Santa Clara, USA).Prior to the extraction the cartridge was washed successively with 1 ml 0.1% TFA,0.1% TFA in 80% ACN, 0.1% TFA.Then the peptide solution was applied and washed with1 ml 0.1% TFA.The peptides were eluted in 600 µl 0.1% TFA in 80% ACN and vacuum dried.The dried peptides were stored at -20°C until further use.For enrichment of the N-terminal peptides, the dried and purified peptides were dissolved in binding buffer as described before without SDS and the enrichment using NHS-Sepharose was performed accordingly.Finally, the enriched N-terminal peptides were desalted again using SampliQ C18 cartridges as described before.The dried peptides were stored at -20°C until mass spectrometric analysis.

Mass spectrometry
NanoLC-MS/MS analyses were performed on an LTQ-Orbitrap Velos Pro (ThermoScientific) equipped with an EASY-Spray Ion Source and coupled to an EASY-nLC 1000 (Thermo Scientific).
Peptides were loaded on a trapping column (2 cm x 75 µm ID.PepMap C18 3 µm particles, 100 Å pore size, Dionex, Thermo Scientific) and separated either on a 25 cm EASY-Spray column (25 cm x 75 µm ID, PepMap C18 2 µm particles,100 Å pore size) with a 30 min linear gradient from 3% to 30% ACN (V5280, Promochem) and 0.1% FA (56302, Thermo Scientific), or on a 50 cm EASY-Spray column (50 cm x 75 µm ID, PepMap C18 2 µm particles, 100 Å pore size, Dionex, Thermo Scientific) with a 360 min linear gradient from 3% to 30% ACN and 0.1% FA in the case of in solution digested proteins such as the enriched N-terminal peptides.MS scans were acquired in the Orbitrap analyzer with a resolution of 30,000 at m/z 400, MS/MS scans were acquired in the Orbitrap analyzer with a resolution of 7500 at m/z 400 using HCD fragmentation with 30% normalized collision energy.A TOP5 or TOP10 data-dependent MS/MS method was used.Dynamic exclusion was applied with a repeat count of 1 and an exclusion duration of 30 sec or 2 min in the case of long gradients.Singly charged precursors were excluded from selection.Minimum signal threshold for precursor selection was set to 50,000.Predictive AGC was used with a target value of 10 6 for MS scans and 5*10 4 for MS/MS scans.Lock mass option was applied for internal calibration using background ions from protonated decamethylcyclopentasiloxane (m/z 371.10124).Electron-transfer dissociation (ETD) fragmentation was performed with 35% normalized collision energy.A TOP5 data dependent MS/MS method was used.Dynamic exclusion was applied with a repeat count of 1 and an exclusion duration of 30 seconds.Singly charged precursors were excluded from selection.Minimum signal threshold for precursor selection was set to75,000.Predictive AGC was used with AGC target a value of 10 6 for MS scans and 5*10 4 for MS/MS scans.ETD activation time was set to 250 ms for doubly, 166 ms for triply and125 ms for quadruple charged precursors, AGC target was set to 200,000 for fluoranthene.Lock mass option was applied for internal calibration in all runs using background ions from iron(III) citrate (m/z 263.956311).

Raw data processing and database search
Raw data were processed with Mascot Distiller (V2.8.3.0,https://www.matrixscience.com/)and database searches were performed using Mascot Server (V2.7.0, https://www.matrixscience.com)against a database containing all V3 Physcomitrella protein models (Lang et al. 2018) as well as their reversed sequences as decoys.In parallel, a search was performed against a database containing the sequences of known contaminants, such as keratin etc. (269 entries, available on request).For all samples semi-specific protease specificities were chosen and in the case of tryptic digests the specificity was set to semi-ArgC.

Computational analysis
The presence of cleavable N-terminal targeting signals (plastid, mitochondria, secretory) was performed with TargetP2.0(Armenteros et al. 2019) and in selected cases with Localizer (Sperschneider et al. 2017).Ambiguous targeting to plastids and mitochondria in Physcomitrella was predicted with ATP2 (Fuss et al. 2013).Potential alternative translation initiation was predicted with TIS (https://www.tispredictor.com/)(Gleason et al. 2022a, b).All plots and tables were created using custom PERL scripts and R (R Core Team 2022).

Figure 1
Figure 1 Overview of identified N-termini, N-terminal modifications and identified cleavage sites of targeting peptides.(A) Frequency of identified N-terminal positions per identified protein accession.The start index represents the position number of the identified N-terminal amino acid in the corresponding protein model.(B) Frequency of the number of identified Ntermini per protein.(C) Bar chart depicting the distribution of identified N-terminal modifications.Peptides bearing N-terminal pyro-glutamate (pyroGlu) were only counted if the

Figure 2 .
Figure 2. Comparison of experimentally observed N-termini with predicted organellar targeting peptide cleavage sites.Depicted are frequencies of identified N-termini around a predicted targeting peptide cleavage site.A difference of 0 indicates an identified N-terminal amino acid corresponds to the P1' amino acid of a predicted cleavage site.Cleavages of plastid (A), thylakoid lumen (B), mitochondrial (C) and secretory (D) targeting signals were predicted with TargetP2.0.All data are available from Supplemental TableS4.(E) Bar chart depicting the distribution of plastid protein isoforms with confirmed cleavage of a plastid targeting peptide and their identified N-terminal modifications.Percentages are related to the total number of identified proteins with a cleaved N-terminal plastid targeting peptide (748) within a window of ±5 amino acids around a predicted cleavage site.All data are available from Supplemental TableS4.Frequency of identified N-termini around a predicted plastid transit peptide cleavage site being either acetylated (F), unmodified (free, (G)) or monomethylated (H).Cum.[%]: cumulative percentage (red points).

Figure 3
Figure 3 Sequence logos of identified N-termini with different modification states of plastid and mitochondrial proteins."Dif" indicates the position difference upstream of a predicted plastid or mitochondrial transit peptide cleavage site.Transit peptide cleavage sites were predicted with TargetP2.0.The prediction data are available from Supplemental TableS4."n" represents the total number of non-redundant sequences.Sequences were aligned at the identified N-terminal amino acid.

Figure
Figure S2 Overview on targeting peptide cleavage site identification and summary of identified targeting peptide cleavage sites.(A) Scheme depicting the calculation of the difference between predicted targeting peptide cleavage site and experimentally observed Nterminus.A difference of 0 indicates full agreement between prediction and observation.(B) Summary of identified plastid (cTP), luminal (luTP), mitochondrial (mTP) and secretory (SP) targeting peptide cleavage sites.A difference of ±5 from a predicted cleavage site was accepted.Data are available from Supplemental TableS4.

Figure
Figure S3 Higher-energy collisional dissociation (HCD) fragment mass spectrum and fragment mass error distribution of the identified N-terminal peptide of RbcS (Pp3c12_19890V3.4).(A) HCD fragment mass spectrum of the peptide MQVWNPIGMTKFE.A mass shift of +31 indicates the hybrid modification of a post-translationally incorporated methyl group and the further addition methyl group from the preformed reductive methylation during sample preparation ( 13 CD2CH2, +31.047208).(B) Fragment mass error distribution of the b-and y-ion series.

Figure
Figure S4 Higher-energy collisional dissociation (HCD) fragment mass spectrum and fragment mass error distribution of the identified N-terminal peptide of COX5B (Pp3c19_11870V3.1).(A) HCD fragment mass spectrum of the peptide SGHAASGQLDEFGIATGAER.A mass shift of +31 indicates the hybrid modification of a posttranslationally incorporated methyl group and the further addition methyl group from the preformed reductive methylation during sample preparation ( 13 CD2CH2, +31.047208).(B) Fragment mass error distribution of the b-and y-ion series.

Figure
Figure S5 Electron-transfer dissociation (ETD) fragment mass spectrum and fragment mass error distribution of the identified N-terminal peptide of RPL19 (Pp3c18_14440V3.1).(A) ETD fragment mass spectrum of the peptide GKQISEIKDFLLTAR.A mass shift of +30 indicates the hybrid modification of a post-translationally incorporated methyl group and the further addition methyl group from the preformed reductive methylation during sample preparation (C2D2H2, +30.043854Da).(B) Fragment mass error distribution of the b-and y-ion series.